Re-Ordered FEGC and Block Based FEGC for Inverted File Compression
نویسندگان
چکیده
Data compression has been widely used in many Information Retrieval based applications like web search engines, digital libraries, etc. to enable the retrieval of data to be faster. In these applications, universal codes (Elias codes (EC), Fibonacci code (FC), Rice code (RC), Extended Golomb code (EGC), Fast Extended Golomb code (FEGC) etc.) have been preferably used than statistical codes (Huffman codes, Arithmetic codes etc). Universal codes are easy to be constructed and decoded than statistical codes. In this paper, the authors have proposed two methods to construct universal codes based on the ideas used in Rice code and Fast Extended Golomb Code. One of the authors’ methods, Re-ordered FEGC, can be suitable to represent small, middle and large range integers where Rice code works well for small and middle range integers. It is also competing with FC, EGC and FEGC in representing small, middle and large range integers. But it could be faster in decoding than FC, EGC and FEGC. The authors’ another coder, Block based RFEGC, uses local divisor rather than global divisor to improve the performance (both compression and decompression) of RFEGC. To evaluate the performance of the authors’ coders, the authors have applied their methods to compress the integer values of the inverted files constructed from TREC, Wikipedia and FIRE collections. Experimental results show that their coders achieve better performance (both compression and decompression) for those files which contain significant distribution of middle and large range integers. Re-Ordered FEGC and Block Based FEGC for Inverted File Compression
منابع مشابه
Genetic and Physiological Analysis of Iron Biofortification in Maize Kernels
BACKGROUND Maize is a major cereal crop widely consumed in developing countries, which have a high prevalence of iron (Fe) deficiency anemia. The major cause of Fe deficiency in these countries is inadequate intake of bioavailable Fe, where poverty is a major factor. Therefore, biofortification of maize by increasing Fe concentration and or bioavailability has great potential to alleviate this ...
متن کاملCan compatible discretization, finite element methods, and discrete Clifford analysis be fruitfully combined?
This paper describes work in progress, towards the formulation, implementation and testing of compatible discretization of differential equations, using a combination of Finite Element Exterior Calculus and discrete Geometric Calculus / Clifford analysis. Much work has been done in the two seemingly separate areas of the Finite Element Method and Geometric Calculus for over 42 years, and the fi...
متن کاملRe-Pair Compression of Inverted Lists
Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers. In this paper we explore a completely different alternative: We use Re-Pair compression of those differences. While Re-Pair by itself offers fast decompressi...
متن کاملSASE: Implementation of a Compressed Text Search Engine
Keyword based search engines are the basic building block of text retrieval systems. Higher level systems like content sensitive search engines and knowledgebased systems still rely on keyword search as the underlying text retrieval mechanism. With the explosive growth in content, Internet and Intranet information repositories require efficient mechanisms to store as well as index data. In this...
متن کاملCluster based Mixed Coding Schemes for Inverted File Index Compression
One way to improve inverted file compression is to use the cluster property [1] of document collection, which states that term occurrences are not uniformly distributed. Some terms are more frequently used in some parts of the collection than in others. The corresponding part of the inverted list will consequently be small d-gap values clustered. Interpolative code [9] exploits the cluster prop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJIRR
دوره 3 شماره
صفحات -
تاریخ انتشار 2013